Implementing Spinlock for RISC-V OS in Rust
Spinlock is one of the most basic synchronization implementations and is one of the first components to consider when implementing an OS.
This article will briefly review the basics of spinlock, how to implement it for self-made OS in Rust, and its advantages over the C language.
All code examples are written in Rust unless otherwise noted.
The need for synchronization
When multiple processes run parallel and touch shared data, their exclusion control becomes problematic. For example, suppose that each process executes its code that modifies the shared data. However, if the code blocks can be performed independently and unrestrictedly, the data is unexpected depending on the timing of the data read/write. This is called a race condition.
The code that causes the race condition is called the critical section. There are synchronization implementations such as spinlock to protect this critical section,
A spinlock uses loops and CPU atomic instructions to limit the number of processes that can execute the critical section to at most one. Note that the name "spin" comes from the looping.
In the Rust language, it is used in the following way.
Here, the spinlock is defined as a data structure called Mutex
(MUTual EXclusion).
let m: Mutex<usize> = Mutex::new(5);
{
// critical section
let mut val = m.lock(); // aqcuire lock
*val = 6;
} // release lock
The Mutex
wraps and protects the type usize
data. Then, the protected data cannot be accessed unless the lock is acquired with lock()
. It also waits in a loop if another process has acquired the lock. Then, the lock is released by drop when it goes out of scope.
Let's also look at an example of spinlock in C compared to Rust. In general, C uses a lock management structure or shared variable to manage the state of the lock, and functions such as acquire()
and release()
are used to acquire and release the lock.
struct lock vallock;
int m = 5;
aqcuire(&vallock);
// critical section
m = 6;
release(&vallock);
In C, there is always the problem of forgetting to release a lock because you have to insert code to acquire and release a lock. And it is also more dangerous than in Rust because you can access the shared data without acquiring the lock.
Now that we have an overview of spinlock let's look at spinlock from the perspective of atomic operation.
Atomic Operation
To put atomic operations in simple terms:
There is a process that combines operations that require multiple memory accesses, and the intermediate state of the process is not system observable. There are only two states: either the sequence of changes always succeeds or fails and there is no change.
The CPU supports instructions for atomic operations, which are also used to implement spinlock.
In the case of RISC-V, there are AMO (Atomic Memory Operation) and Load-Link/Store-Conditional instructions, both of which can perform or implement the following operations.
No interruptions are allowed between memory reads and writes by the instruction, and other processors cannot modify the memory value while the instruction is executing.
The atomic operation used in spinlock is TAS (Test and Set). Let's see what type and function are given for this operation in Rust compared to C.
Test and Set(TAS)
TAS performs value comparisons and assignments atomically. This value can also be used as a flag to implement conditional branching atomically. The following is an example of TAS pseudo code written in Rust:
fn test_and_set(v: &mut bool) -> bool {
if *v {
true
} else {
*v = true;
false
}
}
C usually has a built-in compiler function called __sync_lock_test_and_set(bool *v)
as TAS. Note that the flags set by this function can be released by __sync_lock_release(bool *v)
(simply assign false
to v
). As you can see, C has no types, just compiler built-in functions.
Rust, on the other hand, has dedicated types. For example, AtomicUsize
for usize
, AtomicBool
for bool
, and so on. Moreover, these Atomic types have atomic operations.
Of course, for each Atomic
type, there is compare_exchange()
, which is equivalent to the C compiler built-in function __sync_lock_test_and_set(bool *v)
, and store()
, which is equivalent to __sync_lock_release()
. See the AtomicBool
documentation.
Now, let's implement a simple spinlock using AtomiBool
.
Simple Spinlock
Let's start with a bad example. Suppose the following Mutex
structure to represent a spinlock:
pub struct Mutex<T> {
locked: UnsafeCell<bool>, // Is the lock held?
data: UnsafeCell<T>, // actual data
}
pub struct MutexGuard<'a, T: 'a> {
mutex: &'a Mutex<T>,
}
The important field in this structure is locked
, which has the value false
if the lock is available and true
if it is in use.
Note that UnsafeCell
, which wraps locked
, provides unconstrained internal variability, but we only use it here because it is difficult to code a bad example in Rust language. See documentation if you are interested.
UnsafeCell
also wraps the data
, but this one can be guaranteed safe by locked
on correct implementation.
The following method can be used to acquire a lock:
impl<T> Mutex<T> {
pub fn lock(&self) -> MutexGurad<'_, T> {
let lk = unsafe { &mut *self.locked.get() };
loop {
if !*lk {
*lk = true;
break MutexGuard {
mutex: self
}
}
}
}
}
It waits in a loop until acquiring the lock. When a lock is acquired, it returns a structure called MutexGuard
which allows access to the internal data by deref()
and deref_mut()
. Then, as long as MutexGuard
is alive, the lock is maintained.
This implementation seems to work well, but unfortunately, mutual exclusion cannot be guaranteed in a multiprocess.
Two CPUs may reach the conditional part of if
simultaneously, find that lk
is false, assign true
to lk
in the following line, and both CPUs acquire the lock. Both CPUs have acquired the lock at that point, and the mutual exclusion is broken.
To improve this, we need to execute the if
conditional branch and the true
assignment operation as atomic (i.e., non-separable) steps. This is where we can use the AtomicBool
type mentioned earlier and its method compare_exchange()
:
#[repr(C, align(1))]
pub struct AtomicBool { /* fields omitted */ }
pub fn compare_exchange(
&self,
current: bool,
new: bool,
success: Ordering,
failure: Ordering
) -> Result<bool, bool>
This function internally uses amoswap
, a RISC-V atomic instruction, to write the value of new
if the current value is the same as current
, and the return value indicates whether the new
value has been written or not.
A correct Mutex
implementation would look like this:
pub struct Mutex<T> {
locked: AtomicBool, // Is the lock held?
data: UnsafeCell<T>, // actual data
}
impl<T> Mutex<T> {
pub fn lock(&self) -> MutexGurad<'_, T> {
loop {
if !self.locked.compare_excange(false, true, Ordering::Acquire, Ordering::Relaxed).is_err() {
break MutexGuard {
mutex: self
}
}
}
}
}
This implementation allows atomic checking and the setting of values. Moreover, it overcomes the bad example in which checking and set values could not be controlled exclusively since they consisted of multiple operations.
For unlocking, the drop
trait can be implemented for MutexGuard
as follows to automatically unlock it when MutexGuard
goes out of scope.
impl<'a, T: 'a> Drop for MutexGuard<'a, T> {
fn drop(&mut self) {
self.mutex.locked.store(false, Ordering::Release);
}
}
A conservative spinlock
There is a spinlock that disables the interrupts when acquiring a lock and enables it when releasing all locks to prevent the possibility of a deadlock caused by interruptions. We will call this a conservative spinlock, and let's see how to implement it.
First, if we think of both disabling and enabling interrupts as locking the interrupt function of the CPU, we can think of the structure as an "interrupt lock" on the CPU. Note that once all the "interrupt locks" are released, the interruptions are re-enabled, so we need to remember the number of locks to handle the nested critical section.
So, I came up with a Rust-like interface for this "interrupt lock", which disables interrupts with lock()
, remembers the number of times it has been disabled, and enables interrupts if the number of times it has been disabled is zero when drop()
.
Since disabling interrupts is something we're going to do for the CPU, we'll use the Cpu
structure to manage this:
pub struct Cpus([UnsafeCell<Cpu>; NCPU]);
// Per-CPU state
pub struct Cpu {
pub noff: UnsafeCell<isize>, // Depth of interrupts off
pub intena: bool, // Were interrupts enabled before interrupts off.?
}
Since the target is a multiprocessor, there are multiple CPU cores. The Cpu
structure manages each core, and the Cpus
structure manages the whole CPU. Note that noff
of Cpu
holds the number of times turning off the interrupts, and intena
holds the status of whether interrupts were enabled or not before they were turned off.
And think of the following IntrLock
as a MutexGurad
in Mutex
. That is, when we acquire an "interrupt lock" on the CPU, we can get this structure, and as long as one of these structures is alive, interrupts will be disabled. Conversely, when none of these structures exist, interruptions will be enabled.
pub struct IntrLock<'a> {
cpu: &'a Cpu,
}
noff
manages the number of IntrLock
and lock()
and drop()
can increase or decrease it.
The implementation of "interrupt lock" looks like the following. When we call intr_lock()
on Cpus
, we get the Cpu
structure of the CPU where the process is currently running and disable interrupts. Then, calling lock()
on Cpu
will update the number of times interrupts are disabled and get IntrLock
. When IntrLock
is dropped, it calls Cpu.unlock()
to subtract the number of interrupts disabled, and if the number of disables is zero, it enables the interrupt. In other words, if even one IntrLock
is present, the interrupt is disabled.
impl Cpus {
// Return the pointer this Cpus's Cpu struct.
// interrupts must be disabled.
pub unsafe fn mu_cpu(&self) -> &mut Cpu {
let id = Self::cpu_id();
&mut *self.0[id].get()
}
// intr_lock() disable interrupts on mycpu().
pub fn intr_lock(&self) -> IntrLock {
let old = intr_get(); // get status of interrupts
intr_off(); // disable interrupts
unsafe { self.my_cpu().lock(old) }
}
impl Cpu {
// interrupts must be disabled.
unsafe fn lock(&mut self, old: bool) -> IntrLock {
if *self.noff.get() == 0 {
self.intena = old;
}
*self.noff.get() += 1;
IntrLock { cpu: self }
}
// interrupts must be disabled.
unsafe fn unlock(&self) {
assert!(!intr_get(), "unlock - interruptible");
let noff = self.noff.get();
assert!(*noff >= 1, "unlock");
*noff -= 1;
if *noff == 0 && self.intena {
intr_on()
}
}
}
impl<'a> Drop for IntrLock<'a> {
fn drop(&mut self) {
unsafe {
self.cpu.unlock();
}
}
}
In the above code, you can disable interrupts by calling intr_lock()
on Cpus
. You can also control the enabling of interrupts by dropping them so that you don't forget to allow them to after disabling them.
Implementing a conservative spinlock is accessible by using the "interrupt lock" above.
Acquire IntrLock
in the process of acquiring a lock with Mutex
, and let MutexGuard
have IntrLock
. This way, interruptions will be disabled when the spinlock exists and enabled when releasing all spinlocks with the drop()
.
pub struct Mutex<T> {
locked: AtomicBool, // Is the lock held?
data: UnsafeCell<T>, // actual data
}
pub struct MutexGuard<'a, T: 'a> {
mutex: &'a Mutex<T>,
_intr_lock: IntrLock<'a>,
}
impl<T> Mutex<T> {
pub fn lock(&self) -> MutexGurad<'_, T> {
let _intr_lock = CPUS.intr_lock(); // disable interrupts to avoid deadlock.
loop {
if !self.locked.compare_excange(false, true, Ordering::Acquire, Ordering::Relaxed).is_err() {
break MutexGuard {
mutex: self
_intr_lock,
}
}
}
}
}
It is important to call intr_lock()
before compare_excange()
. If the order is different, there would be a brief window when the lock was held with interrupts enabled, and an unfortunately timed interrupt would deadlock the system. Similarly, it is important to enabling interrupts after store()
, but don't worry, this will always be the right time due to Rust's drop mechanism.
OS Development with Rust
If you implement a spinlock in C that also disables interrupts, there is always the danger of forgetting to release the lock and enable interrupts. For example, intr_lock()
is called in places other than the spinlock, but C doesn't do the post-processing, so you have to call a function called intr_unlock()
every time. When you develop an OS with Rust, you won't face such problems. So, I think Rust is suitable for OS development because of its various other advantages.
You can find the OS code I am implementing, including the complete implementation of the spinlock at this repo.